AITopics | Kisumu

Collaborating Authors

Kisumu

UrbanDataLayer: AUnifiedDataPipeline forUrbanScience

Neural Information Processing SystemsFeb-7-2026, 21:13:44 GMT

Ontheonehand, thediversedata processing steps lead tothelack oflarge-scale7 benchmarks and therefore decelerate iterative methodology improvement on a8 single problem.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > New York (0.05)
Africa > Kenya > Nakuru County > Nakuru (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

0db7f135f6991e8cec5e516ecc66bfba-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-9-2025, 18:29:41 GMT

artificial intelligence, prediction, proceedings, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.06)
North America > United States > New York > New York County > New York City (0.05)
Asia > China > Zhejiang Province > Hangzhou (0.04)
(7 more...)

Industry:

Transportation > Infrastructure & Services (0.93)
Transportation > Ground > Road (0.68)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Unlocking Location Intelligence: A Survey from Deep Learning to The LLM Era

Hao, Xixuan, Jiang, Yutian, Zou, Xingchen, Liu, Jiabo, Yin, Yifang, Liang, Yuxuan

arXiv.org Artificial IntelligenceMay-16-2025

Location Intelligence (LI), the science of transforming location-centric geospatial data into actionable knowledge, has become a cornerstone of modern spatial decision-making. The rapid evolution of Geospatial Representation Learning is fundamentally reshaping LI development through two successive technological revolutions: the deep learning breakthrough and the emerging large language model (LLM) paradigm. While deep neural networks (DNNs) have demonstrated remarkable success in automated feature extraction from structured geospatial data (e.g., satellite imagery, GPS trajectories), the recent integration of LLMs introduces transformative capabilities for cross-modal geospatial reasoning and unstructured geo-textual data processing. This survey presents a comprehensive review of geospatial representation learning across both technological eras, organizing them into a structured taxonomy based on the complete pipeline comprising: (1) data perspective, (2) methodological perspective and (3) application perspective. We also highlight current advancements, discuss existing limitations, and propose potential future research directions in the LLM era. This work offers a thorough exploration of the field and providing a roadmap for further innovation in LI. The summary of the up-to-date paper list can be found in https://github.com/CityMind-Lab/Awesome-Location-Intelligence and will undergo continuous updates.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2505.09651

Country:

Asia > China > Beijing > Beijing (0.05)
North America > United States > District of Columbia > Washington (0.05)
Asia > China > Shanghai > Shanghai (0.05)
(29 more...)

Genre: Overview (1.00)

Industry:

Transportation (0.67)
Banking & Finance > Real Estate (0.46)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.38)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RideKE: Leveraging Low-Resource, User-Generated Twitter Content for Sentiment and Emotion Detection in Kenyan Code-Switched Dataset

Etori, Naome A., Gini, Maria L.

arXiv.org Artificial IntelligenceFeb-10-2025

Social media has become a crucial open-access platform for individuals to express opinions and share experiences. However, leveraging low-resource language data from Twitter is challenging due to scarce, poor-quality content and the major variations in language use, such as slang and code-switching. Identifying tweets in these languages can be difficult as Twitter primarily supports high-resource languages. We analyze Kenyan code-switched data and evaluate four state-of-the-art (SOTA) transformer-based pretrained models for sentiment and emotion classification, using supervised and semi-supervised methods. We detail the methodology behind data collection and annotation, and the challenges encountered during the data curation phase. Our results show that XLM-R outperforms other models; for sentiment analysis, XLM-R supervised model achieves the highest accuracy (69.2\%) and F1 score (66.1\%), XLM-R semi-supervised (67.2\% accuracy, 64.1\% F1 score). In emotion analysis, DistilBERT supervised leads in accuracy (59.8\%) and F1 score (31\%), mBERT semi-supervised (accuracy (59\% and F1 score 26.5\%). AfriBERTa models show the lowest accuracy and F1 scores. All models tend to predict neutral sentiment, with Afri-BERT showing the highest bias and unique sensitivity to empathy emotion. https://github.com/NEtori21/Ride_hailing

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2024.wassa-1.19

2502.0618

Country:

Africa > Kenya > Nairobi City County > Nairobi (0.07)
Africa > Kenya > Nairobi Province (0.06)
Africa > Kenya > Mombasa County > Mombasa (0.05)
(18 more...)

Genre: Research Report > New Finding (0.54)

Industry:

Transportation > Passenger (1.00)
Information Technology (1.00)
Transportation > Ground > Road (0.93)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(2 more...)

Add feedback

Uchaguzi-2022: A Dataset of Citizen Reports on the 2022 Kenyan Election

Mondini, Roberto, Kotonya, Neema, Logan, Robert L. IV, Olson, Elizabeth M, Lungati, Angela Oduor, Odongo, Daniel Duke, Ombasa, Tim, Lamba, Hemank, Cahill, Aoife, Tetreault, Joel R., Jaimes, Alejandro

arXiv.org Artificial IntelligenceDec-17-2024

Online reporting platforms have enabled citizens around the world to collectively share their opinions and report in real time on events impacting their local communities. Systematically organizing (e.g., categorizing by attributes) and geotagging large amounts of crowdsourced information is crucial to ensuring that accurate and meaningful insights can be drawn from this data and used by policy makers to bring about positive change. These tasks, however, typically require extensive manual annotation efforts. In this paper we present Uchaguzi-2022, a dataset of 14k categorized and geotagged citizen reports related to the 2022 Kenyan General Election containing mentions of election-related issues such as official misconduct, vote count irregularities, and acts of violence. We use this dataset to investigate whether language models can assist in scalably categorizing and geotagging reports, thus highlighting its potential application in the AI for Social Good space.

computational linguistic, dataset, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2412.13098

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Africa > Kenya > Bomet County > Bomet (0.05)
(35 more...)

Genre: Research Report (0.50)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Voting & Elections (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations

Gema, Aryo Pradipta, Jin, Chen, Abdulaal, Ahmed, Diethe, Tom, Teare, Philip, Alex, Beatrice, Minervini, Pasquale, Saseendran, Amrutha

arXiv.org Artificial IntelligenceOct-24-2024

Large Language Models (LLMs) often hallucinate, producing unfaithful or factually incorrect outputs by misrepresenting the provided context or incorrectly recalling internal knowledge. Recent studies have identified specific attention heads within the Transformer architecture, known as retrieval heads, responsible for extracting relevant contextual information. We hypothesise that masking these retrieval heads can induce hallucinations and that contrasting the outputs of the base LLM and the masked LLM can reduce hallucinations. To this end, we propose Decoding by Contrasting Retrieval Heads (DeCoRe), a novel training-free decoding strategy that amplifies information found in the context and model parameters. DeCoRe mitigates potentially hallucinated responses by dynamically contrasting the outputs of the base LLM and the masked LLM, using conditional entropy as a guide. Our extensive experiments confirm that DeCoRe significantly improves performance on tasks requiring high contextual faithfulness, such as summarisation (XSum by 18.6%), instruction following (MemoTrap by 10.9%), and open-book question answering (NQ-Open by 2.4% and NQ-Swap by 5.5%).

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.1886

Country:

South America > Colombia > Meta Department > Villavicencio (0.04)
North America > Dominican Republic (0.04)
North America > Canada > Ontario > Toronto (0.04)
(12 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment (0.92)
Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Kenyan Sign Language (KSL) Dataset: Using Artificial Intelligence (AI) in Bridging Communication Barrier among the Deaf Learners

Wanzare, Lilian, Okutoyi, Joel, Kang'ahi, Maurine, Ayere, Mildred

arXiv.org Artificial IntelligenceOct-23-2024

Kenyan Sign Language (KSL) is the primary language used by the deaf community in Kenya. It is the medium of instruction from Pre-primary 1 to university among deaf learners, facilitating their education and academic achievement. Kenyan Sign Language is used for social interaction, expression of needs, making requests and general communication among persons who are deaf in Kenya. However, there exists a language barrier between the deaf and the hearing people in Kenya. Thus, the innovation on AI4KSL is key in eliminating the communication barrier. Artificial intelligence for KSL is a two-year research project (2023-2024) that aims to create a digital open-access AI of spontaneous and elicited data from a representative sample of the Kenyan deaf community. The purpose of this study is to develop AI assistive technology dataset that translates English to KSL as a way of fostering inclusion and bridging language barriers among deaf learners in Kenya. Specific objectives are: Build KSL dataset for spoken English and video recorded Kenyan Sign Language and to build transcriptions of the KSL signs to a phonetic-level interface of the sign language. In this paper, the methodology for building the dataset is described. Data was collected from 48 teachers and tutors of the deaf learners and 400 learners who are Deaf. Participants engaged mainly in sign language elicitation tasks through reading and singing. Findings of the dataset consisted of about 14,000 English sentences with corresponding KSL Gloss derived from a pool of about 4000 words and about 20,000 signed KSL videos that are either signed words or sentences. The second level of data outcomes consisted of 10,000 split and segmented KSL videos. The third outcome of the dataset consists of 4,000 transcribed words into five articulatory parameters according to HamNoSys system.

artificial intelligence, machine translation, natural language, (13 more...)

arXiv.org Artificial Intelligence

2410.18295

Country:

Asia > Pakistan (0.04)
North America > United States > Hawaii (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(8 more...)

Genre:

Overview (0.68)
Research Report (0.64)
Instructional Material (0.46)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Artificial Intelligence for Public Health Surveillance in Africa: Applications and Opportunities

Tshimula, Jean Marie, Kalengayi, Mitterrand, Makenga, Dieumerci, Lilonge, Dorcas, Asumani, Marius, Madiya, Déborah, Kalonji, Élie Nkuba, Kanda, Hugues, Galekwa, René Manassé, Kumbu, Josias, Mikese, Hardy, Tshimula, Grace, Muabila, Jean Tshibangu, Mayemba, Christian N., Nkashama, D'Jeff K., Kalala, Kalonji, Ataky, Steve, Basele, Tighana Wenge, Didier, Mbuyi Mukendi, Kasereka, Selain K., Dialufuma, Maximilien V., Kumwita, Godwill Ilunga Wa, Muyuku, Lionel, Kimpesa, Jean-Paul, Muteba, Dominique, Abedi, Aaron Aruna, Ntobo, Lambert Mukendi, Bundutidi, Gloria M., Mashinda, Désiré Kulimba, Mpinga, Emmanuel Kabengele, Kasoro, Nathanaël M.

arXiv.org Artificial IntelligenceAug-5-2024

Artificial Intelligence (AI) is revolutionizing various fields, including public health surveillance. In Africa, where health systems frequently encounter challenges such as limited resources, inadequate infrastructure, failed health information systems and a shortage of skilled health professionals, AI offers a transformative opportunity. This paper investigates the applications of AI in public health surveillance across the continent, presenting successful case studies and examining the benefits, opportunities, and challenges of implementing AI technologies in African healthcare settings. Our paper highlights AI's potential to enhance disease monitoring and health outcomes, and support effective public health interventions. The findings presented in the paper demonstrate that AI can significantly improve the accuracy and timeliness of disease detection and prediction, optimize resource allocation, and facilitate targeted public health strategies. Additionally, our paper identified key barriers to the widespread adoption of AI in African public health systems and proposed actionable recommendations to overcome these challenges.

africa, outbreak, prediction, (15 more...)

arXiv.org Artificial Intelligence

2408.02575

Country:

Africa > Sub-Saharan Africa (0.05)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
Africa > West Africa (0.05)
(78 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Vaccines (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
(6 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(7 more...)

Add feedback

COVID-19 Twitter Sentiment Classification Using Hybrid Deep Learning Model Based on Grid Search Methodology

Tembhurne, Jitendra, Agrawal, Anant, Lakhotia, Kirtan

arXiv.org Artificial IntelligenceJun-11-2024

In the contemporary era, social media platforms amass an extensive volume of social data contributed by their users. In order to promptly grasp the opinions and emotional inclinations of individuals regarding a product or event, it becomes imperative to perform sentiment analysis on the user-generated content. Microblog comments often encompass both lengthy and concise text entries, presenting a complex scenario. This complexity is particularly pronounced in extensive textual content due to its rich content and intricate word interrelations compared to shorter text entries. Sentiment analysis of public opinion shared on social networking websites such as Facebook or Twitter has evolved and found diverse applications. However, several challenges remain to be tackled in this field. The hybrid methodologies have emerged as promising models for mitigating sentiment analysis errors, particularly when dealing with progressively intricate training data. In this article, to investigate the hesitancy of COVID-19 vaccination, we propose eight different hybrid deep learning models for sentiment classification with an aim of improving overall accuracy of the model. The sentiment prediction is achieved using embedding, deep learning model and grid search algorithm on Twitter COVID-19 dataset. According to the study, public sentiment towards COVID-19 immunization appears to be improving with time, as evidenced by the gradual decline in vaccine reluctance. Through extensive evaluation, proposed model reported an increased accuracy of 98.86%, outperforming other models. Specifically, the combination of BERT, CNN and GS yield the highest accuracy, while the combination of GloVe, BiLSTM, CNN and GS follows closely behind with an accuracy of 98.17%. In addition, increase in accuracy in the range of 2.11% to 14.46% is reported by the proposed model in comparisons with existing works.

accuracy, hybrid model, sentiment analysis, (13 more...)

arXiv.org Artificial Intelligence

2406.10266

Country:

Asia > India (0.05)
Asia > Singapore (0.04)
Oceania > New Zealand (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

BooookScore: A systematic exploration of book-length summarization in the era of LLMs

Chang, Yapei, Lo, Kyle, Goyal, Tanya, Iyyer, Mohit

arXiv.org Artificial IntelligenceOct-5-2023

Summarizing book-length documents (>100K tokens) that exceed the context window size of large language models (LLMs) requires first breaking the input document into smaller chunks and then prompting an LLM to merge, update, and compress chunk-level summaries. Despite the complexity and importance of this task, it has yet to be meaningfully studied due to the challenges of evaluation: existing book-length summarization datasets (e.g., BookSum) are in the pretraining data of most public LLMs, and existing evaluation methods struggle to capture errors made by modern LLM summarizers. In this paper, we present the first study of the coherence of LLM-based book-length summarizers implemented via two prompting workflows: (1) hierarchically merging chunk-level summaries, and (2) incrementally updating a running summary. We obtain 1193 fine-grained human annotations on GPT-4 generated summaries of 100 recently-published books and identify eight common types of coherence errors made by LLMs. Because human evaluation is expensive and time-consuming, we develop an automatic metric, BooookScore, that measures the proportion of sentences in a summary that do not contain any of the identified error types. BooookScore has high agreement with human annotations and allows us to systematically evaluate the impact of many other critical parameters (e.g., chunk size, base LLM) while saving $15K and 500 hours in human evaluation costs. We find that closed-source LLMs such as GPT-4 and Claude 2 produce summaries with higher BooookScore than the oft-repetitive ones generated by LLaMA 2. Incremental updating yields lower BooookScore but higher level of detail than hierarchical merging, a trade-off sometimes preferred by human annotators. We release code and annotations after blind review to spur more principled research on book-length summarization.

annotation, fantasy, summarization, (16 more...)

arXiv.org Artificial Intelligence

2310.00785

Country:

Europe > United Kingdom (0.14)
North America > Canada > Ontario > Toronto (0.05)
Africa > Uganda > Central Region > Kampala (0.04)
(13 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (0.45)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Health & Medicine (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback